Publication Title: Transcriptomic analyses reveal rhythmic and CLOCK-driven pathways in human skeletal muscle
Publication Date: 2018-04-16
Publication Journal: eLife
GEO ID: GSE108539
We will first nest our A1 and A2 document in this notebook to continue our analysis.
We will nest our A1 document in this notebook to continue our analysis.
148 identifiers are missing.
148 out of 16093 have not been mapped. This accounts 0.92% of the genes that were not identifed. To address the unidentified genes, we will attempt to use a new tool called Biobtree [3]. This is an alternative tool to biomarRt, using a different dataset, hence we may be able to map some more unidentified genes.
Note: Code for Heatmap matrix for unthresholded dataset is commented out and hidden as it takes too long to execture.
Typically, our samples cluster together, though we have 4 outliers to the right of the graph.
In the study related to the dataset, they are looking at which genes are expressed at the 6 different timepoints to get a sense of the genes involved in the human circadian rhythm.
Here, we will fit our data from each timepoint to a linear model. To do this, we will first create our experiment model design and will segregate our patient samples into groups by patient and timepoint.
We have identified 1590 upregulated genes and 925 downregulated genes to be used in our gene enrichment analysis.
Platform Title: Illumina HiSeq 2500 (Homo sapiens)
Original submission date: Mar 14 2013
Last update date: Mar 27 2019
Organism: Homo sapiens
No. of GEO datasets that use this technology: 6149
No. of GEO samples that use this technology: 177574
First, we will generate our ranked list (from A2), omitting unneeded columns. We will also export this list of ranked genes for further analysis using in the GSEA software [2], [4].
ranked_list <- qlf_output_hits_withgn[, c("hgnc_symbol", "rank")]
ranked_list[1:10, ]
ranked_list <- ranked_list[ranked_list$hgnc_symbol != "", ] # removing rows with no gene name.
write.table(ranked_list[2:nrow(ranked_list), ], file="data/ranked_list.rnk", row.names = FALSE, sep="\t", quote=FALSE)
We used the ranked list generated in the previous block and downloaded the GO (Biological Processes without GO annotations with evidence codes (IEA, ND, and RCA))[http://download.baderlab.org/EM_Genesets/March_01_2020/Human/symbol/Human_GOBP_AllPathways_no_GO_iea_March_01_2020_symbol.gmt] from the Bader Lab downloads directory. The following settings were used for the GSEA Preranked analysis: This results in the following summary:
Here, na_pos are the upregulated genes and na_neg are down regulated genes. Immediately, we will notice that there are far more down regulated genesets in comparison to upregulated.
[1]D. Merico, R. Isserlin, O. Stueker, A. Emili, and G. D. Bader, “Enrichment Map: A Network-Based Method for Gene-Set Enrichment Visualization and Interpretation,” PLoS ONE, vol. 5, no. 11, p. e13984, Nov. 2010, doi: 10.1371/journal.pone.0013984.
[2]A. Subramanian et al., “Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles,” Proceedings of the National Academy of Sciences, vol. 102, no. 43, pp. 15545–15550, Oct. 2005, doi: 10.1073/pnas.0506580102.
[3]J. Reimand et al., “Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap,” Nat Protoc, vol. 14, no. 2, pp. 482–517, Feb. 2019, doi: 10.1038/s41596-018-0103-9.
[4]V. K. Mootha et al., “PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes,” Nat Genet, vol. 34, no. 3, pp. 267–273, Jul. 2003, doi: 10.1038/ng1180.
[5]L. Perrin et al., “Transcriptomic analyses reveal rhythmic and CLOCK-driven pathways in human skeletal muscle,” eLife, vol. 7, p. e34114, Apr. 2018, doi: 10.7554/eLife.34114.